Distributed computing in practice: the Condor experience
نویسندگان
چکیده
Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course traveled by research ideas as they grow into production systems.
منابع مشابه
Hands - On Experience with Condor , the Advanced Load
These days UNIX workstations are common in both the academia and the industry. It is very seldom the case that a computer engineer or a computer scientist does not have a UNIX workstation sitting on her/his desk. The users of those workstations often fall under one of two overlapping categories. The rst is an administrative user who works on email, paper preparation , code development and debug...
متن کاملHow to measure a large open-source distributed system
How can we measure the impact of an open-source software package over time? When a system has no price, no purchase contracts and no buyers or sellers it can be difficult to judge its impact on the world. To explore this issue, we have instrumented the Condor distributed batch system in a variety of ways and observed its growth to over 50 000 CPUs at over 1000 sites over five years. Instrumenta...
متن کاملSamgrid Experiences with the Condor Technology in Run Ii Computing
SAMGrid is a globally distributed system for data handling and job management, developed at Fermilab for the D0 and CDF experiments in Run II. The Condor system is being developed at the University of Wisconsin for management of distributed resources, computational and otherwise. We briefly review the SAMGrid architecture and its interaction with Condor, which was presented earlier. We then pre...
متن کاملA performance study of job management systems
Job Management Systems (JMSs) efficiently schedule and monitor jobs in parallel and distributed computing environments. Therefore, they are critical for improving the utilization of expensive resources in high-performance computing systems and centers, and an important component of grid software infrastructure. With many JMSs available commercially and in the public domain, it is difficult to c...
متن کاملError Scope on a Computational Grid: Theory and Practice
Error propagation is a central problem in grid computing. We re-learned this while adding a Java feature to the Condor computational grid. Our initial experience with the system was negative, due to the large number of new ways in which the system could fail. To reason about this problem, we developed a theory of error propagation. Central to our theory is the concept of an error’s scope, defin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency - Practice and Experience
دوره 17 شماره
صفحات -
تاریخ انتشار 2005